A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark
نویسندگان
چکیده
Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSIGRAM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GRAMI (Graph Mining) algorithm by an order of magnitude for all datasets and can work with a lower support threshold.
منابع مشابه
OO-FSG: An Object-Oriented Approach to Mine Frequent Subgraphs
Frequent subgraph mining (FSG) has always been an important issue in data mining. Several frequent subgraph mining methods have been developed for mining graph data. However, most of these are main memory algorithms in which scalability is a bigger issue. A few algorithms have opted for a relational approach that stores the graph data in relational tables. However, relational databases have the...
متن کاملGRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph
Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or proteinprotein interactions in bioinformatics, are modeled as a single large graph. In...
متن کاملFinding Frequent Subgraphs in a Single Graph based on Symmetry
Mining frequent subgraphs is a basic activity that plays an important role in mining graph data. In this paper an algorithm is proposed to find frequent subgraphs in a single large graph that has applications such as protein interactions, social networks, web interactions. One of the key operations required by any frequent subgraph discovery algorithm is to perform graph isomorphism. The propos...
متن کاملThe ParMol Package for Frequent Subgraph Mining
Mining for frequent subgraphs in a graph database has become a popular topic in the last years. Algorithms to solve this problem are used in chemoinformatics to find common molecular fragments in a database of molecules represented as two-dimensional graphs. However, the search process in arbitrary graph structures includes costly graph and subgraph isomorphism tests. In our ParMol package we h...
متن کاملMining Frequent Graph Sequence Patterns Induced by Vertices
The mining of a complete set of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences (dynamic graphs or evolving graphs). In this paper, we define a novel class of subgraph subsequence called an “induced subgraph subsequence” to enable efficient mining of a complete set of freq...
متن کامل